1 Introduction

2 Learning objectives

@TASK

  1. You can explain the key characteristics of an sf map object.
  2. You understand that geospatial features are represented as points, linestrings and polygons.
  3. You can use {ggplot} and geom_sf() to map points, linestrings and polygons

3 Packages

This lesson requires the following packages:

if(!require('pacman')) install.packages('pacman')
pacman::p_load_gh("afrimapr/afrilearndata")
pacman::p_load_gh("ropensci/rnaturalearthhires")
pacman::p_load(ggspatial,
               ggplot2,
               rnaturalearth,
               spData,
               mapview,
               tidyverse)

4 Preamble: mapping country borders with rnaturalearth

As an introduction to plotting maps in R, let’s look at how to draw a simple world map with country borders. The package {rnaturalearth} contains information to map all the countries in the world, among other things.

To obtain this map information, use the ne_countries() function, with the argument returnclass = "sf".

countries <- ne_countries(returnclass = "sf")

The code returns an sf object with the shapes for all countries. We will soon explain in detail what an “sf” object is. For now, just note that it is a special kind of data frame for geospatial data in R:

class(countries)
## [1] "sf"         "data.frame"

Now, the countries object can be plotted very easily with the geom_sf function of {ggplot2}:

ggplot(data = countries) + 
  geom_sf()

Wonderful! (Almost too easy!)

To subset to a specific continent, use the continent argument of ne_countries():

# Countries in South America
south_am <- ne_countries(returnclass = "sf", 
                         continent = "south america")

ggplot(data = south_am) + 
  geom_sf()

The continent argument can accept a vector with multiple continents:

# Countries in north and south america
north_south_am <- ne_countries(returnclass = "sf", 
                               continent = c("north america", "south america"))

ggplot(data = north_south_am) +
  geom_sf()

To subset to a specific country or specific countries, use the country argument:

# Map of Nigeria and Niger
nigeria_niger <- ne_countries(returnclass = "sf", 
                              country = c("nigeria", "niger"))

ggplot(data = nigeria_niger) +
  geom_sf()

◘ Use ne_countries(), ggplot() and geom_sf() to plot a single map of the countries in Asia and Africa

q_asia_africa <- ne_countries(returnclass = "sf", 
                              continent = c("asia", "africa"))

ggplot(data = asia_africa) +
  geom_sf()

◘ Use ne_countries(), ggplot() and geom_sf() to plot a single map of the national borders of China and Indonesia

china_indonesia <- ne_countries(returnclass = "sf", 
                                country = c("china", "indonesia"))

ggplot(data = china_indonesia) +
  geom_sf()

5 Understanding sf objects

So far we’ve been passing these sf objects into ggplot without thinking about their underlying structure. Let’s now look under the hood to understand sf objects better. In the process, you will encounter several key concepts you’ll need to understand when working with geospatial data in R.

First of all, what does the acronym “sf” mean? It stands for Simple Features, which is a set of widely-used standards for storing geospatial information in databases. The details of these standards are beyond the scope of this course; just know that the {sf} R package was written to bring spatial data analysis in R closer towards these Simple Features standards. {sf} largely replaces an older R package for geospatial analysis {sp}, although you may still need to use {sp} once in a while.

Now, what do sf objects look like and how do we work with them? To answer this, we’ll look at a slice of the countries object previously defined. Since this sf object is just a special kind of data frame, we can manipulate it with standard functions like dplyr::select(). So let’s select just three columns to make the object easier to observe:

countries %>% 
  select(name, # country name
         pop_est, # estimated population
         geometry)
## Simple feature collection with 177 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## Geodetic CRS:  SOURCECRS
## First 10 features:
##                     name  pop_est                       geometry
## 0            Afghanistan 28400000 MULTIPOLYGON (((61.21082 35...
## 1                 Angola 12799293 MULTIPOLYGON (((16.32653 -5...
## 2                Albania  3639453 MULTIPOLYGON (((20.59025 41...
## 3   United Arab Emirates  4798491 MULTIPOLYGON (((51.57952 24...
## 4              Argentina 40913584 MULTIPOLYGON (((-65.5 -55.2...
## 5                Armenia  2967004 MULTIPOLYGON (((43.58275 41...
## 6             Antarctica     3802 MULTIPOLYGON (((-59.57209 -...
## 7 Fr. S. Antarctic Lands      140 MULTIPOLYGON (((68.935 -48....
## 8              Australia 21262641 MULTIPOLYGON (((145.398 -40...
## 9                Austria  8210281 MULTIPOLYGON (((16.97967 48...

What do we see? The object consists of a 5-line header and a data frame.

6 The sf header

The header provides some contextualizing information about the rest of the object. You usually don’t need to pay too much attention to this header, but we will go through it in some detail here, because it provides an opportunity to learn a few key terms in geospatial analysis.

Let’s go line-by-line through this header to see what these terms mean:

6.1 Features and fields

The first line of the header tells you the number of features and fields in the sf object:

👉Simple feature collection with 177 features and 2 fields👈
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: SOURCECRS

Features are simply the geographical objects represented by each row of the data frame. In our countries dataset, each country has its own row; therefore each country is a feature.

And what are fields? These are the attributes that pertain to each feature in the data. In our countries dataset, the fields include “name”, the name of each country, and “pop_est”, its estimated population. Fields are essentially equivalent to columns in the data frame, although the “geometry” column does not count as a field.

Fig: Features and fields of an `sf` object

The spData::nz dataset contains mapping information for the regions of New Zealand. How many features and fields does the dataset have?

# Delete the wrong lines and run the correct line:
q_nz_features_fields <- "A. 16 features and 6 fields"
q_nz_features_fields <- "B. 20 features and 5 fields"
q_nz_features_fields <- "C. 5 features and 4 fields"

6.2 Geometry (points, lines and polygons)

The second line of the header gives you the type of geometry in the sf object:

Simple feature collection with 177 features and 2 fields
👉Geometry type: MULTIPOLYGON👈
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: SOURCECRS

“Geometry” is essentially a synonym for “shape”. There are three main geometry types: points, lines and polygons. Each of these has its respective “multi” version: multipoints, multilines and multipolygons.

The figure below outlines these main types of geometries.

Fig: Geometry types and example maps for each. Points, lines (or linestrings) and polygons are the most common geometries you will encounter.

In the case of the countries object, the geometry type is stated as “MULTIPOLYGON”. What does this mean? Some countries are made of just one polygon (that is, a single bounded shape), but others are composed of multiple polygons. For example, Indonesia is composed of several islands, each of which is its own polygon. But each sf object can have only one geometry type, so all countries are “forced” to have the MULTIPOLYGON geometry type.

So now you’re familiar with what “polygon” geometries look like. Let’s then look at geospatial datasets with “point” and “linestring” geometries.

Points

As described in the image above, points are simply individual x,y locations. They could represent the location of a case (e.g. a patient home), small objects like wells or trees, or the center points (called “centroids”) of larger objects like buildings or entire cities. Let’s see one example now.

The ne_download() function from {rnaturalearth} can be used to obtain geospatial information for mapping some major world ports:

world_ports <- 
  ne_download(scale = 10,
              type = "ports", 
              returnclass = "sf") 
## OGR data source with driver: ESRI Shapefile 
## Source: "/private/var/folders/vr/shb6ffvj2rl61kh7qqczhrgh0000gn/T/Rtmp3w83vy", layer: "ne_10m_ports"
## with 1081 features
## It has 6 fields
## Integer64 fields read as strings:  ne_id

(Do consult the {rnaturalearth} documentation at docs.ropensci.org/rnaturalearth to learn how to use ne_download() to download many other interesting geospatial datasets.)

When we print this world_ports object, we can see the “POINT” geometry is referenced on line 2 of the header, and also in each row of the geometry column:

world_ports %>% 
  select(name, geometry) %>% 
  head()
## Simple feature collection with 6 features and 1 field
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -69.92356 ymin: -38.89444 xmax: -58.95141 ymax: 12.4375
## Geodetic CRS:  WGS 84
##                           name                    geometry
## 1                Sint Nicolaas   POINT (-69.92356 12.4375)
## 2                      Campana POINT (-58.95141 -34.15333)
## 3                       Zarate POINT (-59.00495 -34.09889)
## 4 Puerto Belgrano/Bahia Blanca POINT (-62.10088 -38.89444)
## 5   Puerto Galvan/Bahia Blanca POINT (-62.30053 -38.78306)
## 6 Ingeniero White/Bahia Blanca POINT (-62.25989 -38.79194)

Each point has two coordinates—in this case longitude and latitude—that describe its position. For the first feature, “Sint Nicolaas”, the point coordinates are -69.92356 and 12.4375. The first number, -69.9…, refers to longitude 69.9° West, and 12.4… refers to latitude 12.4° North.

As before, we can plot this world_ports object directly with ggplot() and geom_sf(), to map the locations of all points:

ggplot(data = world_ports) + 
  geom_sf()

Very cool! You have now plotted the point geometries representing the major world ports.

Lines

Lines or linestrings consist of two or more interconnected points. They are used to represent objects like roads, rivers or utility lines. Let’s look at one example.

The ne_download() function from {rnaturalearth} can be used to obtain geospatial information for mapping some major world rivers:

world_rivers <-
  ne_download(scale = 50,
              category = "physical",
              type = "rivers_lake_centerlines", 
              returnclass = "sf") 
## OGR data source with driver: ESRI Shapefile 
## Source: "/private/var/folders/vr/shb6ffvj2rl61kh7qqczhrgh0000gn/T/Rtmp3w83vy", layer: "ne_50m_rivers_lake_centerlines"
## with 478 features
## It has 36 fields
## Integer64 fields read as strings:  ne_id

When we print this world_rivers object, we can see the “MULTILINESTRING” geometry referenced on line 2 of the header, and in each row of the geometry column:

world_rivers %>% 
  select(name_en, geometry) %>% 
  head()
## Simple feature collection with 6 features and 1 field
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: -91.97004 ymin: 11.63229 xmax: 56.68579 ymax: 60.50012
## Geodetic CRS:  WGS 84
##       name_en                       geometry
## 0        Kama MULTILINESTRING ((51.93713 ...
## 1        Kama MULTILINESTRING ((53.69385 ...
## 2 Lesser Abay MULTILINESTRING ((37.11301 ...
## 3   Euphrates MULTILINESTRING ((38.56119 ...
## 4     Alabama MULTILINESTRING ((-86.52177...
## 5      Albany MULTILINESTRING ((-91.31225...

The geometry type is “multilinestring”, rather than just “linestring”, because some rivers have breaks or forks along their length, and so must be represented by more than one line. For example, the “Chico River” in the Philippines (from the world_rivers dataset we downloaded) has two distinct segments, each of which must be represented by its own linestring:

As with the polygon and point geometry objects, we can plot this object directly with ggplot() and geom_sf():

ggplot(world_rivers) + 
  geom_sf()

Nice!

It is possible (though rarely necessary) to extract and print to console the entire list of coordinates for a linestring or polygon:

## Print the linestring for the first row in `world_rivers`
world_rivers[["geometry"]][[1]] 

MULTILINESTRING ((22.75649 -20.47161, 22.79458 -20.43499, 22.80981 -20.42571, 22.85693 -20.40462, 22.91909 -20.38948, 22.93091 -20.38372, 22.98672 -20.34564, 23.01089 -20.33577, 23.10303 -20.31097, 23.11929 -20.30423, 23.27441 -20.15775, 23.32788 -20.1242))

In this linestring, the first point has a coordinates 22.75649 and -20.47161, the second point has coordinates 22.79458 -20.43499, and so on.

The tables below shows the “raw” coordinate representations for other geometries:

Points, lines and polygons

The afrilearndata::afriairports contains geographic data on airports in Africa.

◘ What type of geometry is used to represent each capital?

# Delete the wrong lines and run the correct line:
q_airports_geom_type <- "LINESTRING"
q_airports_geom_type <- "POLYGON"
q_airports_geom_type <- "POINT"

◘ Plot the airports with ggplot() and geom_sf().

q_airports_plot <- 
ggplot(data =   afrilearndata::afriairports) + 
  geom_sf()

q_airports_plot

The ne_download() function from {rnaturalearth} can be used to obtain a map of major world roads, using the code below:

roads <- 
  ne_download(scale = 10, 
              category = "physical",
              type = "geographic_lines", 
              returnclass = "sf") 

◘ What type of geometry is used to represent the rivers?

# Delete the wrong lines and run the correct line:
q_rivers_geom_type <- "MULTILINESTRING"
q_rivers_geom_type <- "MULTIPOLYGON"
q_rivers_geom_type <- "MULTIPOINT"

◘ Plot the rivers with ggplot() and geom_sf().

q_rivers_plot <- 
  ggplot(data = rivers) + 
  geom_sf()

q_rivers_plot

Great work! Hopefully you now understand the three main kinds of geometry and how these can be plotted. Soon you will see how to add additional information to these geometries to create thematic maps.

6.3 Dimension

The third header line gives the “dimension” of the sf object:

Simple feature collection with 177 features and 2 fields
Geometry type: MULTIPOLYGON
👉Dimension: XY👈
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: SOURCECRS

This simply refers to the Cartesian coordinate system for this object. An object with two-dimensional features has XY coordinates, while one with three-dimensional features has XYZ coordinates.

In this course chapter we will not deal with any three-dimensional features; so the “XY” dimension is all you will see.

6.4 Bounding box

The fourth line of the header provides the coordinates of the sf object’s bounding box:

Simple feature collection with 177 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
👉Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513👈
Geodetic CRS: SOURCECRS

The bounding box is the smallest rectangle that can surround all the features in the object. The box is represented by its lower-left and upper-right coordinates, as illustrated in the figure below:

Fig: An example bounding box, for the country of Turkey

For our countries map, the bounding box is written in terms of longitude and latitude: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513.

This is equivalent to: 180°W, 90°E, 180°E, 83.64513°S.

The spData::nz dataset contains mapping information for the regions of New Zealand. Observe the bounding box for this object. Does it appear to be in terms of latitude and longitudes?

# Delete the wrong lines and run the correct line:
q_nz_bounding_box <- "A. No, it's not in terms of latitude and longitude."
q_nz_bounding_box <- "B. Yes, it is in terms of latitude and longitude."

6.5 Geodetic CRS

The final header line tells us what coordinate reference system used.

Simple feature collection with 177 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
👉Geodetic CRS: SOURCECRS👈

You can ignore this for now; it will be the focus of a later chapter.


Phew! That was a lot of information. Congratulations on getting through it.

In the process of exploring the components of an sf header, we have also explained several of the core concepts you will need to know for your geospatial work. We will refer back to these concepts throughout this course chapter.

7 The sf data frame

Now that we have covered most of the terms in the sf header, let’s consider the actual data frame that holds the geospatial information.

First, we’ll define a new object, countries_select, with only three columns, that we can manipulate and view easily.

countries_select <- 
  ne_countries(returnclass = "sf") %>% 
  select(name, # country name
         pop_est)  # estimated population

head(countries_select)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.41544 ymin: -55.25 xmax: 75.15803 ymax: 42.68825
## Geodetic CRS:  SOURCECRS
##                   name  pop_est                       geometry
## 0          Afghanistan 28400000 MULTIPOLYGON (((61.21082 35...
## 1               Angola 12799293 MULTIPOLYGON (((16.32653 -5...
## 2              Albania  3639453 MULTIPOLYGON (((20.59025 41...
## 3 United Arab Emirates  4798491 MULTIPOLYGON (((51.57952 24...
## 4            Argentina 40913584 MULTIPOLYGON (((-65.5 -55.2...
## 5              Armenia  2967004 MULTIPOLYGON (((43.58275 41...

7.1 The geometry column

The primary property that makes an sf data frame soecial is its geometry column. This is the column that holds the core geospatial data (points, linestrings or polygons). Let’s go through some noteworthy points about this column.

The geometry column can be renamed

In all our previous examples, the geometry column was literally called “geometry”, but this does not have to be the case. We can rename it to any arbitrary name. For example, below, we rename the “geometry” column to “geographic_info”:

countries_new_col_name <- 
  countries_select %>% 
  rename(geographic_info = geometry)

head(countries_new_col_name)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -73.41544 ymin: -55.25 xmax: 75.15803 ymax: 42.68825
## Geodetic CRS:  SOURCECRS
##                   name  pop_est                geographic_info
## 0          Afghanistan 28400000 MULTIPOLYGON (((61.21082 35...
## 1               Angola 12799293 MULTIPOLYGON (((16.32653 -5...
## 2              Albania  3639453 MULTIPOLYGON (((20.59025 41...
## 3 United Arab Emirates  4798491 MULTIPOLYGON (((51.57952 24...
## 4            Argentina 40913584 MULTIPOLYGON (((-65.5 -55.2...
## 5              Armenia  2967004 MULTIPOLYGON (((43.58275 41...

As usual, {ggplot} will recognize and plot this column, despite the new name:

ggplot(countries_new_col_name) + 
  geom_sf()

The geometry column can’t be dropped

The geometry column cannot normally de dropped.

Recall that further above, we ran the following select() statement that asked for only two columns, “name” and “pop_est”:

countries_select <-
  countries %>% 
  select(name, # country name
       pop_est)  # estimated population

but this command returned three columns: “name”, “pop_est” and “geometry”. Why?

Since the “geometry” column holds the core geospatial data for an sf object, it is always included—it cannot be dropped in the normal ways.

To drop this column, you have to convert the object to a regular data frame with as.data.frame() or as_tibble() first:

countries_tibble <- as_tibble(countries_select)
head(countries_tibble)
## # A tibble: 6 × 3
##   name                  pop_est                                         geometry
##   <chr>                   <dbl>                               <MULTIPOLYGON [°]>
## 1 Afghanistan          28400000 (((61.21082 35.65007, 62.23065 35.27066, 62.984…
## 2 Angola               12799293 (((16.32653 -5.87747, 16.57318 -6.622645, 16.86…
## 3 Albania               3639453 (((20.59025 41.8554, 20.46318 41.51509, 20.6051…
## 4 United Arab Emirates  4798491 (((51.57952 24.2455, 51.75744 24.29407, 51.7943…
## 5 Argentina            40913584 (((-65.5 -55.2, -66.45 -55.25, -66.95992 -54.89…
## 6 Armenia               2967004 (((43.58275 41.09214, 44.97248 41.24813, 45.179…
countries_tibble %>% 
  select(name, pop_est) %>% 
  head()
## # A tibble: 6 × 2
##   name                  pop_est
##   <chr>                   <dbl>
## 1 Afghanistan          28400000
## 2 Angola               12799293
## 3 Albania               3639453
## 4 United Arab Emirates  4798491
## 5 Argentina            40913584
## 6 Armenia               2967004

(Of course, you usually should not need to drop the geometry column from your sf objects.

geom_sf() automatically recognizes the geometry column

So far you have been passing the sf object into geom_sf() without adding any arguments, for example:

ggplot(data = countries_select) + 
  geom_sf()

This is because geom_sf() automatically recognizes the geometry column and passes it in as an aesthetic. You can make this more explicit, by passing the column manually to the geometry argument:

ggplot(data = countries_select) + 
  geom_sf(mapping = aes(geometry = geometry))

For the countries_new_col_name defined above, in which we renamed geometry column, the code would look like this:

ggplot(data = countries_new_col_name) + 
  geom_sf(mapping = aes(geometry = geographic_info))

7.2 Filtering an sf data frame

Apart from the geometry column, sf data frames are like regular data frames, and can be manipulated as such. For example, we can filter the data frame to specific rows in order to plot only specific features in the dataset. Let’s try this now.

Recall that further above, we used the country argument in ne_countries() to return a subset of the countries provided by {rnaturalearth}. For example, to return the map of Nigeria and Niger, we ran:

# Map of Nigeria and Niger
nigeria_niger <- ne_countries(type = "countries", returnclass = "sf", 
                         country = c("nigeria", "niger"))

ggplot(data = nigeria_niger) +
  geom_sf()

We can now achieve the same result by taking the data frame for all countries and using dplyr::filter() to filter it down to just Nigeria and Niger:

nigeria_niger_filtered <- 
  countries_select %>% 
  filter(name %in% c("Nigeria", "Niger"))

nigeria_niger_filtered
## Simple feature collection with 2 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 0.2956464 ymin: 4.240594 xmax: 15.90325 ymax: 23.47167
## Geodetic CRS:  SOURCECRS
##      name   pop_est                       geometry
## 1   Niger  15306252 MULTIPOLYGON (((2.154474 11...
## 2 Nigeria 149229090 MULTIPOLYGON (((8.500288 4....

As before, we can plot this object with ggplot() and geom_sf():

ggplot(data = nigeria_niger_filtered) + 
  geom_sf()

Excellent! So you see now that sf objects can be manipulated like data frames: we can select specific columns and filter specific rows, for example. Soon we will see how to perform other data frame operations on sf objects, like mutating, summarizing and joining.

For now, try out the practice questions below.

◘ From the afrilearndata::afriairports dataset, filter then plot only the airports in South Africa. Use filter(), ggplot() and geom_sf()

q_south_africa_airports_plot <- 
  afrilearndata::afriairports %>% 
  filter(country_name == "South Africa") %>% 
  ggplot() + 
  geom_sf()
q_south_africa_airports_plot

◘ From the rivers dataset previously defined, filter then plot the two features named “Nile”

q_nile <- 
  rivers %>% 
  filter(name == "Nile") %>% 
  ggplot() + 
  geom_sf()

q_nile

8 Visualizing sf objects with mapview::mapview()

@INCOMPLETE

A nice way to get a quick overview of all the fields and features in a map object is with mapview::mapview()

When you click on any feature, you see a list of fields (attributes):

[SCREENSHOT]

9 Wrap up

@INCOMPLETE

Contributors

The following team members contributed to this lesson:

References

Some material in this lesson was adapted from the following sources:

This work is licensed under the Creative Commons Attribution Share Alike license. Creative Commons License